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Abstract 

By  providing  direct  data  transfer  between  storage  and  client,  network-attached  storage  devices  have  the 
potential  to  improve  scalability  (by  removing  the  server  as  a  bottleneck)  and  performance  (through  net¬ 
work  striping  and  shorter  data  paths).  Realizing  the  technology’s  full  potential  requires  careful  consider¬ 
ation  across  a  wide  range  of  file  system,  networking  and  security  issues.  To  address  these  issues,  this 
paper  presents  two  new  network-attached  storage  architectures.  (1)  Networked  SCSI  disks  (NetSCSI) 
are  network-attached  storage  devices  with  minimal  changes  from  the  familiar  SCSI  interface  (2)  Net- 
work-attached  secure  disks  (NASD)  are  drives  that  support  independent  client  access  to  drive  provided 
object  services.  For  both  architectures,  we  present  a  sketch  of  repartitionings  of  distributed  file  system 
functionality,  including  a  security  framework  whose  strongest  levels  use  tamper-resistant  processing  in 
the  disks  to  provide  action  authorization  and  data  privacy  even  when  the  drive  is  in  an  physically  inse¬ 
cure  location. 

Using  AFS  and  NFS  traces  to  evaluate  each  architecture’s  potential  to  decrease  file  server  workload,  our 
results  suggest  that  NetSCSI  can  reduce  file  server  load  during  a  burst  of  AFS  activity  by  a  factor  of 
about  2;  for  the  NASD  architecture,  server  load  (during  burst  activity)  can  be  reduced  by  a  factor  of 
about  4  for  AFS  and  10  for  NFS. 
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Introduction 

Evolving  distributed  storage  technology  for  higher  performance  computing 

Distributed  file  systems  provide  remote  access  to  common  file  storage  in  a  networked  envi¬ 
ronment.  They  enable  users  of  groups  of  computers  to  operate  as  though  they  were  sharing  a 
single  large  file  system  [Sandberg85,  Howard88,  Minshall94] . 

A  principal  measure  of  a  distributed  file  system’s  cost  is  the  computational  power  required 
from  the  servers  to  provide  adequate  performance  for  each  client’s  work  [Howard88, 
Nelson88].  AFS  helps  reduce  server  load  by  using  each  client’s  local  disk  to  cache  a  subset 
of  the  global  system’s  files,  allowing  a  client’s  local  file  system  cache  to  handle  a  large  frac¬ 
tion  of  its  distributed  file  system  accesses  without  contacting  the  server.  This  enables  AFS 
servers  to  support  more  clients  than  those  distributed  file  systems,  like  NFS,  whose  client 
caching  is  limited  to  the  client’s  in-memory  file  buffer  cache.  Of  course,  in  systems  that  pro¬ 
vide  strong  caching  semantics  (e.g.,  AFS),  maintaining  the  consistency  of  client  caches 
introduces  a  new,  albeit  much  smaller,  computational  load  on  file  servers.  This  load 
increases  as  clients  cache  more  aggressively.  In  addition,  there  are  limits  to  the  effectiveness 
of  client  caching  —  if  nothing  else,  servers  must  still  serve  first-reference  reads  and  misses 
caused  by  invalidations  in  client  caches. 

In  large  shared  file  systems,  this  remaining  workload  is  too  large  for  a  single  traditional  file 
server.  One  way  to  handle  the  load  is  to  use  multiple  servers.  Multiple-server  distributed  file 
systems  attempt  to  balance  the  load  by  replicating  static,  commonly  used  files  and  by  parti¬ 
tioning  the  namespace  of  the  remaining  files  (that  is,  locating  files  that  are  clustered  in  the 
same  area  of  the  global  directory  tree  on  the  same  server).  Namespace  balancing  is  effective 
when  it  divides  files  into  sets  corresponding  to  essentially  non-overlapping  organizational 
units,  but  such  units  are  often  too  large  to  be  serviced  by  a  single  low-cost  server.  Hence, 
many  installations  either  split  the  namespace  of  a  single  organizational  unit  over  multiple 
servers  or  resort  to  specialized  super- fileservers  that  are  large  enough  to  centrally  manage  all 
storage  for  an  organizational  unit  [Hitz90,  Drapeau94] .  Splitting  the  namespace  leads  to  the 
“hotspot”  problem  familiar  from  multiple-disk  mainframe  experience  [Kim86],  and  can 
require  frequent  user-directed  namespace  adjustment.  Super-fileservers  can  provide  good 
performance,  but  are  an  expensive  solution. 

Experience  with  disk  arrays  suggests  another  solution.  If  the  data  is  striped  over  multiple 
independent  controllers,  then  a  high-concurrency  workload  where  individual  accesses  are 
small  relative  to  the  unit  of  interleaving,  will  be  balanced  with  high  probability  [Linvy87, 
Chen90].  Striping  metadata  provides  similar  load-balancing  for  metadata  operations 
[Dahlin95]. 

Lowering  and  balancing  the  workload  applied  to  servers  by  each  client  through  client  cach¬ 
ing  and  storage  striping,  respectively,  provide  excellent  cost  control,  but  do  not  ensure  high- 
performance  for  clients.  With  exponential  increases  in  microprocessor  performance  and  the 
improvements  in  workstation  memory  bandwidth  required  to  support  them  (such  as  Ram- 
Bus  [Rambus92],  or  Synchronous  DRAM  [Toshiba]),  ubiquitous  personal  workstations  are 
increasingly  capable  of  high-performance  data  processing.  Relying  on  caching  to  satisfy  the 
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data  throughput  needs  of  such  high-performance  clients  would  require  cache  miss  rates  to 
decrease  proportionately.  Unfortunately,  increasing  computation  sizes,  file  sizes,  and  work¬ 
group  sharing  are  all  blocking  the  needed  decrease  in  miss  rates  [Ousterhout85,  Baker91], 
while  increased  client  cache  sizes  are  making  those  misses  more  bursty.  Thus,  if  client  per¬ 
formance  is  to  improve,  the  performance  of  distributed  file  systems,  while  servicing  client 
cache  misses,  must  also  improve. 

This  is  the  argument  that  led  storage  subsystem  designers  to  develop  disk  arrays:  striping 
storage  promises  parallel  transfer  of  large  files  and  load  balancing  of  high  concurrency 
workloads  [Patterson88].  For  distributed  file  systems,  striping  storage  over  multiple  servers 
promises  scalable  storage  bandwidth  [Hartman93]  as  long  as  the  network  can  sustain  the 
communication  load.  Cost  and  scalability  concerns  prohibit  the  use  of  a  single  shared-media 
network  whose  peak  capacity  meets  the  maximum  communication  load.  However,  with  the 
wide  acceptance  of  switched  network  fabrics  based  on  point-to-point  links,  such  as  switched 
Fast  Ethernet,  switched  FDDI,  ATM,  and  Myrinet,  whose  links  have  100  Mbit/sec  to  800 
Mbit/sec  capacities,  striped  storage  bandwidth  can  scale  up  to  the  limitations  of  client  links, 
independent  of  other  traffic  in  the  same  fabric  [Arnould89,  Siu95,  Boden95].  Of  course,  a 
client’s  network  performance  is  limited  by  far  more  than  its  link’s  raw  bandwidth.  Fortu¬ 
nately,  there  has  been  substantial  research  progress  toward  overcoming  many  of  these  per¬ 
formance  limitations.  Powerful  interface  board  designs  [Steenkiste94,  Cooper90,  FORE94], 
integrated  layer  processing  for  network  protocols  [Clark89],  direct  application  access  to  the 
network  interface  [vonEiken92,  Maeda93],  copy  avoiding  buffering  schemes  [Druschel93], 
and  routing  support  for  high-performance  best-effort  traffic  [Ma96,  Traw95]  are  all  increas¬ 
ing  the  bandwidth  available  to  client  applications. 

In  practice,  distributed  file  systems  are  often  built  as  a  series  of  many  small  purchases  made 
by  small  groups.  Invariably,  these  small  groups  are  primarily  interested  in  buying  client 
machines.  However,  the  economics  of  providing  a  high-performance  striped  distributed  file¬ 
system  are  like  those  of  purchasing  an  expensive  centralized  mainframe:  a  large  investment 
requiring  the  financial  collaboration  of  many  small  groups.  One  way  to  avoid  synchronizing 
purchasing  and  “taxing”  the  budget  of  the  purchaser  of  each  new  machine  in  the  distributed 
file  system  domain  is  to  instead  tax  the  machine  itself;  the  xES  filesystem  distributes  code, 
metadata  and  data  over  all  clients,  eliminating  the  need  for  a  centralized  storage  system 
[Dahlin95].  This  scheme  naturally  matches  increasing  client  performance  with  increasing 
server  performance.  Instead  of  reducing  the  server  workload,  it  takes  the  required  computa¬ 
tional  power  from  another,  frequently  idle,  client.  The  network-attached  storage  architec¬ 
tures  presented  in  this  paper  significantly  reduce  the  demand  for  server  computation  and 
eliminate  file  server  machines  from  the  storage  data  path  without  coupling  overall  file  sys¬ 
tem  integrity  to  the  security  of  each  client  machine. 

Storage  technology ’s  concurrent  evolution 

As  distributed  file  system  technology  has  improved,  so  have  the  storage  technologies 
employed  by  these  systems.  Primarily  fixed-surface,  flying-head  magnetic  (hard)  disks,  a 
technology  developed  over  three  decades  ago,  storage  devices  have  evolved  to  provide 
increasing  density,  increasing  data  and  seek  speeds,  and  increasing  embedded  intelligence. 
Storage  density  increases,  long  a  predictable  25%  per  year,  have  been  delivering  60% 
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increases  per  year  during  the  90s.  Prior  to  the  mid-80s,  data  rates  were  constrained  by  stor¬ 
age  interface  definitions,  but  they  have  increased  by  about  40%  per  year  in  the  90s 
[Grochowski96] . 

The  primary  reason  for  storage’s  recent  accelerated  rate  of  evolution  has  been  the  broad 
acceptance  of  the  Small  Computer  System  Interface  (SCSI)  standard,  which  abstracted  the 
device  as  a  linear  array  of  fixed-size  blocks  with  an  embedded  command-interpreting  con¬ 
troller  and  a  shared,  relatively  high-speed  bus  linking  devices  to  a  computer’s  I/O  bus 
through  a  “host-bus  adapter”  [ANSI86].  In  contrast  to  pre-SCSI  storage  devices,  whose 
interfaces  exposed  data  rates,  disk  geometry  [McKusick84],  data-dependent  addressing 
[Ahearn72],  and  bufferless  speed  matching  (causing  so-called  rotational  positioning  recon¬ 
nect  miss  delays  [Buzen86]),  SCSI  decouples  storage  component  interfaces  from  the  host’s 
storage  interface,  enabling  rapid  introduction  of  incremental  technology  advances. 

Moreover,  the  adoption  of  SCSI  (and  its  less-expensive  contemporary,  IDE)  across  a  broad 
range  of  the  marketplace  has  increased  competition  among  disk  drive  manufacturers  by 
eliminating  the  customer’s  compelling  motivation  to  purchase  storage  from  the  vertically 
integrated  provider  of  the  rest  of  the  computer  system.  The  result,  60%  per  year  density 
increases  and  40%  per  year  data  rate  increases,  is  now  yielding  surface  densities  over  1.3 
Cbit  per  square  inch  [IBM96],  unit  capacities  over  10  CBytes  [Seagate96b]  and  sustained 
data  rates  up  to  12  MBytes/sec  [Seagate96a].  At  this  rate  of  improvement,  we  can  expect 
data  rates  in  excess  of  40  MBytes/sec  by  the  end  of  the  decade. 

The  level  of  indirection  introduced  by  SCSI  has  also  led  to  transparent  improvements  in 
storage  performance;  a  device  can  provide  better  availability  and  functionality  while  export¬ 
ing  the  same  interface.  Notable  examples  include  Redundant  Arrays  of  Inexpensive  Disks 
(RAID);  transparent  failure  recovery;  real-time  geometry-sensitive  scheduling;  buffer  cach¬ 
ing,  readahead,  and  writebehind;  compression;  dynamic  mapping  and  representation  migra¬ 
tion  [Patterson88,  Cibson92,  Massiglia94,  StorageTek94,  Wilkes95,  Ruemmler91, 
Varma95]. 

Currently,  smart  storage  subsystems  contain  tens  to  hundreds  of  GBytes  of  storage,  service 
thousands  of  accesses  per  second,  and  easily  saturate  double  and  quadruple  speed  SCSI 
buses.  With  this  pressure  on  the  performance  of  SCSI’s  physical  interconnect,  the  industry  is 
today  (1996)  experiencing  uncertainty  about,  and  rapid  development  in,  peripheral  intercon¬ 
nect  technologies  [Sachs94].  On  one  hand,  traditional  SCSI  advocates  are  deploying  shorter, 
faster,  wider  buses  with  data  rates  of  20,  40,  and  80  MBytes  peak  (increasing  addressability 
primarily  through  hierarchical  storage  controllers)  [ANSI95].  Others,  particularly  interested 
in  increasing  the  number  and  multiplicity  of  devices  and  hosts  interconnected,  have  replaced 
the  physical  implementation  of  SCSI  with  high-speed  serial,  packetized,  ring  interconnects 
such  as  Fibre  Channel  (up  to  100  MBytes/sec)  [Benner96]  and  SSA  (up  to  40  MBytes/sec 

per  link^)  [SSA].  The  disk  drive  industry  anticipates  the  marginal  cost  for  Fibre  Channel  and 
SSA  interfaces  on  the  disk  to  be  typical  of  today’s  Ethernet  adapters  while  their  host  adapter 


1.  In  SSA,  independence  of  links  attached  to  each  node  allows  multiple  point-to-point  transfers  in 
parallel  where  these  transfers  are  physically  non-overlapping. 
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Distributed  File  Manager 


Local  Area  Network 


Figure  1:  Network-attached  storage,  in  general,  provides  a  direct  network  connection  between  client  and  storage.  It  may 
or  may  not  separate  higher-level  file  system  function  from  storage  into  a  file  manager  machine.  It  may  or  may  not  have  a 
private  peripheral  network  linking  file  manager  and  storage  devices.  Although  pictured  here  as  a  single-actuator  disk 
drive,  a  network-attached  disk  is  any  device  attached  to  the  network  and  offering  storage  functionality.  That  is,  for  the 
purposes  of  this  paper,  a  RAID  subsystem  can  be  considered  a  network-attached  disk. 


costs  are  expected  to  be  comparable  to  high-performance  SCSI  adapters  (between  the  cost 
of  ethernet  and  ATM  or  FDDI  interfaces)  [Anderson95]. 


To  take  advantage  of  these  improvements  in  network  and  storage  technology,  we  can  attach 
storage  directly  to  the  network.  Distributed  file  systems  using  network-attached  storage 
scale  more  cost-effectively  for  two  reasons:  server  off-loading  and  network  striping.  By  off¬ 
loading  simple,  expensive,  and  data-intensive  operations  from  file  server  machines,  more 
clients  and  drives  can  be  supported  by  each  file  manager  machine.  By  coupling  file  access 
computation  and  network  transfer  bandwidth  to  each  drive,  aggregate  transfer  bandwidth 
scales  with  drives  (rather  than  server  memory  and  network  bandwidth)  and  data  avoids  a 
store-and-forward  copy  through  the  server  machine.  In  the  remainder  of  this  paper,  we 
present  an  overview  of  network- attached  storage,  a  taxonomy  of  network- attached  storage, 
and  experiments  that  attempt  to  evaluate  the  performance  of  the  proposed  architeetures. 

Network-Attached  Storage 

What  is  Network- Attached  Storage? 

Figure  1  gives  an  overview  of  network- attached  storage.  This  teehnology,  called  network- 
based  storage  in  a  trend-predicting  paper  by  Katz  [Katz92],  is  not  new..  The  Mass  Storage 
System  Referenee  Model  (MSSRM),  an  early  architeeture  for  hierarchical  storage  sub¬ 
systems,  has  advocated  the  separation  of  control  and  data  paths  for  almost  a  decade 
[MillerSS,  IEEE94].  Using  a  high-bandwidth  network  that  supports  direct  transfers  for  the 
data  path  is  a  natural  consequenee  [KronenbergSb,  Drapeau94,  Eong94,  Eee95, 
Menasee96].  In  the  High  Performance  Storage  System  (HPSS)  [Watson95],  the  MSSRM 
model  has  been  implemented  and  augmented  with  socket-level  striping  of  file  transfers, 
called  the  Parallel  Transport  Protocol  [Berdahl95,  Wiltzus95],  over  the  multiple  network 
interfaces  found  on  mainframes  and  supercomputers. 
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Our  notion  of  network- attached  storage  is  consistent  with  these  projects.  However,  our  anal¬ 
ysis  focuses  on  the  evolution  of  commodity  storage  systems  rather  than  supercomputing 
systems,  and  on  the  interaction  of  network-attached  storage  with  common  distributed  file 
systems.  Our  goal  is  to  chart  the  way  network-attached  storage  is  likely  to  appear  in  prod¬ 
ucts,  estimate  its  scalability  implications,  and  characterize  the  security  and  file  system 
design  issues  in  its  implementation. 

Following  Van  Meter’s  [VanMeter96]  definition  of  network- attached  peripheral,  we  con¬ 
sider  networks  that  are  shared  with  general  local  area  network  traffic  and  not  single-vendor 
systems  whose  backplanes  are  fast,  isolated  local  area  networks  [Horst95,  IEEE-SCI92]. 

A  taxonomy  of  network-attached  storage  architectures 

Simply  attaching  storage  to  a  network  underspecifies  network-attached  storage’s  role  in  the 
distributed  file  system  architecture.  In  the  following  subsections  we  present  a  taxonomy  for 
the  functional  composition  of  network-attached  storage. 

Case  0,  the  base  case,  is  the  familiar  local  area  network  with  storage  privately  connected  to 
the  system’s  file  server  — we  call  this  server-attached  disks.  Case  1  represents  a  wide  vari¬ 
ety  of  current  products,  server-integrated  disks,  that  specialize  hardware  and  software  into 
an  integrated  file  server  product.  In  Case  2,  with  current  generation  disk  drives  already 
attaching  to  peripheral  networks,  the  obvious  network- attached  disk  design,  network  SCSI, 
minimizes  modifications  to  the  drive  command  interface,  hardware  and  software.  Einally, 
given  the  rapidly  increasing  processor  capability  of  the  disk-embedded  controllers,  there  is 
an  opportunity  to  restructure  the  drive  command  interface  to  more  effectively  off-load  data 
access  functionality.  In  Case  3,  we  call  these  higher-function  storage  devices  network- 
attached  secure  disks. 

Case  0:  Most  storage  systems  today  are  Server-Attached  Disks  (SAD) 

This  is  the  system  familiar  to  office  and  campus  local  area  networks,  and  is  illustrated  in 
Eigure  2.  Storage  is  attached  privately  to  general-purpose  machines  that  are  dedicated  to  dis¬ 
tributed  file  service  function. 

Case  1:  Optimized  implementations:  Server  Integrated  Disks  (SID) 

Since  file  server  machines  often  do  little  other  than  service  distributed  file  system  requests, 
it  makes  sense  to  construct  specialized  systems  that  perform  only  file  system  functions  and 
do  not  perform  general-purpose  computation.  This  architecture  is  not  fundamentally  differ¬ 
ent  from  server- attached  disk  (hence,  it  is  not  separately  illustrated).  Data  must  still  move 
through  the  server  machine  before  it  reaches  the  network,  but  specialized  servers  can  move 
this  data  more  efficiently  than  general-purpose  machines.  Since  high  performance  distrib¬ 
uted  file  service  benefits  the  productivity  of  most  users,  this  architecture  occupies  a  high 
margin  (profitable)  market  niche  [Hitz90,  Hitz94] .  However,  this  approach  binds  the  client 
to  the  chosen  distributed  file  system,  its  semantics,  and  its  performance  characteristics.  Eor 
example,  most  server-integrated  disks  provide  NES  file  service  whose  inherent  performance 
has  long  been  criticized  [Howard88].  Since  the  marketplace  has  not  selected  a  single,  high- 
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Figure  2:  Server-attached  disk  is  the  familiar  local-area-network  distributed  file  system.  A  client  wanting  data  from 
storage  sends  a  message  to  the  file  server  (1),  which  parses  it  and  sends  a  message  to  storage  (2),  which  accesses  the 
data  and  sends  it  back  to  the  file  server  (3),  which  finally  sends  the  requested  data  back  to  the  client  (4).  Server- 
integrated  disk  is  logically  the  same  structure,  except  that  the  hardware  and  software  in  the  file  server  machine  is 
specialized  to  the  file  service  function. 


performance  distributed  file  system,  this  architecture  does  not  facilitate  the  development  of 
a  SCSI-like  commodity  market.  Also,  a  critical  feature  of  scalable  storage,  server  striping,  is 
not  well- supported  by  any  of  the  existing  popular  distributed  file  systems,  so  binding  the  cli¬ 
ent  storage  interface  to  an  existing  distributed  file  system  is  at  least  premature. 

Case  2:  Network  SCSI  (NetSCSI) 

An  approach  at  the  other  extreme  from  server-integrated  disks  is  to  retain  as  much  as  possi¬ 
ble  of  the  current  dominant  storage  device  protocol,  SCSI.  This  is  the  natural  evolution  path 
for  storage  devices;  Seagate’s  Barracuda  FC  is  already  providing  packetized  SCSI  through 
Fibre  Channel  network  ports  to  directly  attached  hosts  [Seagate96a].  Network  SCSI 
(NetSCSI),  shown  in  Figure  3,  is  a  network- attached  storage  architecture  that  makes  mini¬ 
mal  changes  to  the  hardware  and  software  of  SCSI  disks.  A  file  manager  machine  translates 
its  clients’  file  system  requests  into  SCSI  commands  for  its  disks,  but  rather  than  returning 
data  to  the  file  manager  to  be  forwarded,  the  NetSCSI  disk  sends  data  directly  to  the  client. 
The  SCSI  COPY  command  already  supports  such  third-party  transfers.  By  eliminating  the 
file  manager  from  the  data’s  path,  its  workload  per  active  client  decreases.  The  efficient  data 
transfer  engines  typical  of  fast  drives  ensure  that  the  drive’s  sustained  bandwidth  is  available 
to  the  clients  through  the  network,  and  that  the  file  manager  machine  need  not  be  replicated 
when  striping  files  over  many  disks  for  higher  bandwidth  still.  However,  the  use  of  third- 
party  transfer  changes  the  drive’s  role  in  the  overall  security  of  a  distributed  file  system, 
itself  varying  from  simple  accident  avoidance  in  NFS  to  privacy  for  all  transfers  in  AFS. 

There  are  four  levels  of  security  within  the  NetSCSI  disk  model:  (1)  accident-avoidance 
with  a  second  private  network  between  file  manager  and  disk,  both  locked  in  a  physically 
secure  room;  (2)  data  transfer  authentication  with  clients  and  drives  additionally  equipped 
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Figure  3:  Network  SCSI  (NetSCSI)  is  a  network-attached  disk  architecture  designed  for  minimal  change  in  the  disk’s 
command  interface.  However,  because  the  network  port  on  these  disks  may  be  connected  to  the  hostile,  broader 
internet,  integrity  for  the  file  system  structure  on  disk  requires  a  second  port  to  a  private,  file  manager-owned,  network. 
If  a  client  wants  data  from  a  NetSCSI  disk,  it  sends  a  message  (I)  to  the  distributed  file  system’s  file  manager  which 
processes  the  request  normally  (2),  sending  a  message  over  the  private  network  to  the  NetSCSI  disk  (3).  The  disk 
accesses  data,  transfers  it  directly  to  the  client  (4),  and  sends  its  completion  status  to  the  file  manager  over  the  private 
network  (5).  Finally,  the  file  manager  completes  the  request  with  a  status  message  to  the  client  (6). 


with  a  strong  one-way  hash  function;  (3)  data  transfer  privacy  with  clients  and  drives  addi¬ 
tionally  equipped  with  encryption  and;  (4)  secure  key  management,  where  a  secure  copro¬ 
cessor  removes  the  need  for  the  disk  to  be  remain  physically  secure. 

Figure  3  shows  the  weakest  NetSCSI  security  enhancement:  a  second  network  port  on  each 
disk.  Since  SCSI  disks  execute  every  command  they  receive  without  an  explicit  authoriza¬ 
tion  check,  even  well-meaning  clients  can  generate  arbitrary  commands  and  accidentally 
damage  arbitrary  parts  of  the  file  structure  on  disk.  The  drive’s  second  network  port  pro¬ 
vides  accident  avoidance  while  allowing  SCSI  command  interpreters  to  continue  following 
their  execution  model;  a  NetSCSI  drive  executes  all  commands  arriving  over  the  private  net¬ 
work  port,  rejecting  all  commands  arriving  on  the  general  network.  This  is  the  architecture 
employed  in  the  HPSS  and  SIOF  projects  at  LLNL  [Wiltzius95,  Watson95].  Assuming  that 
file  manager  and  NetSCSI  disks  are  locked  in  a  secure  room,  this  mechanism  is  acceptable 
for  the  trusted  network  security  model  of  the  NFS  distributed  file  system  (which  trusts  the 
bits  in  a  packet’s  header  to  specify  the  originating  address  for  authentication)  [SandbergSS]. 

Because  file  data  still  travels  over  the  general  network  which  is  potentially  hostile,  NetSCSI 
disks  are  likely  to  demand  greater  security  than  the  accident  avoidance  provided  by  a  private 
network.  Cryptographic  protocols  can  strengthen  the  security  of  NetSCSI.  At  a  minimum,  a 
strong  one-way  hash  function,  such  as  SHA  [NIST94],  computed  at  the  drive  and  at  the  cli¬ 
ent  may  allow  data  transfer  authentication,  in  that  the  correct  data  was  received  only  if  the 
sender  and  receiver  compute  the  same  strong  one-way  hash  on  the  data.  Since  error-correct¬ 
ing  code  hardware  is  already  applied  to  all  data  transferred  to  and  from  a  disk’s  magnetic 
media,  it  should  be  possible  to  interpose  a  strong  one-way  hash  function  at  the  drive  without 
reducing  sustained  transfer  bandwidth. 

Data  transfer  authentication  between  drive  and  client  is  not  sufficient  to  provide  transfer  pri¬ 
vacy.  To  provide  privacy,  a  NetSCSI  disk  must  be  able  to  encrypt  data.  With  encryption. 
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Figure  4:  Network-attached  secure  disks  are  designed  to  off-load  more  of  the  hie  system’s  simple,  expensive,  and 
performance  critical  operations  to  the  storage  devices.  For  example,  in  one  potential  protocol  a  client  prior  to  reading  a 
hie,  requests  access  to  that  hie  from  the  hie  manager  (1),  which  installs  into  the  NASD  drive  a  capability  for  access  to 
the  targeted  hie  (2)  and  delivers  this  capability  to  the  authorized  client  (3).  So  equipped,  this  client  may  make  repeated 
accesses  to  different  regions  of  the  hie  (4,  5)  without  further  contacting  the  hie  manager. 

NetSCSI  drives  can  use  cryptographic  protocols  to  construct  secure  virtual  channels  over 
the  untrusted  network.  However,  since  keys  will  be  stored  in  devices  vulnerable  to  physical 
attack,  the  servers  must  still  be  stored  in  physically  secure  environments,  such  as  a  locked 
room.  If  we  go  one  step  further  and  equip  NetSCSI  disks  with  a  secure  coprocessor  that  can 
securely  store  keys  [Tygar95],  data  can  be  stored  in  encrypted  form  on  the  disk,  allowing  the 
disks  to  be  used  in  a  variety  of  physically  open  environments.  There  are  now  a  variety  of 
secure  coprocessors  [Cylink95,  NIST94a,  WeingartSV,  White87,  Telequip95,  National94 
Javailable,  some  of  which  provide  cryptographic  accelerators  sufficient  to  support  single¬ 
disk  bandwidths  [National96]. 


Case  3:  Network-attached  Secure  Disks  (NASD) 

With  network- attached  secure  disks,  shown  in  Figure  4,  we  relax  the  goal  of  minima! 
change  from  the  existing  SCSI  interface  and  implementation.  Instead  we  focus  on  selecting 
a  command  interface  that  off-loads  more  of  the  file  manager’s  work  onto  the  disk  without 
integrating  file  system  policy  into  the  disk.  Fast-path  operations,  like  reads  and  writes,  go 
straight  to  the  disk,  and  less-common  ones,  like  namespace  manipulations,  go  to  the  file 
manager.  The  disk  can  present  a  flat  namespace  for  file-like  objects,  with  pathname  resolu¬ 
tion  split  between  the  file  manager  and  client.  While  a  single  drive  object  will  suffice  to  rep¬ 
resent  a  simple,  client  file,  multiple  objects  may  be  logically  linked  by  the  file  system  into 
one  client  file.  For  example,  banks  of  striped  files  [Hartman93],  Macintosh-style  forks,  file 
data  and  metadata,  or  logically-contiguous  chunks  of  complex  files  [deJong93]. 

Clients  directly  request  access  to  data  regions  in  objects,  so  a  NASD  drive  must  have  suffi¬ 
cient  metadata  on  hand  to  map  an  object  region  to  a  set  of  magnetic  media  sectors.  This 
metadata  could  be  provided  by  the  file  manager  dynamically  or  it  could  be  maintained  by 
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the  drive.  While  the  latter  approach  asks  distributed  file  system  authors  to  surrender  detailed 
control  over  layout  of  the  files  they  create,  it  enables  smart  drives  to  better  exploit  detailed 
knowledge  of  their  own  resources  to  optimize  data  layout,  readahead,  and  cache  manage¬ 
ment  [deJonge93,  Patterson95].  This  is  precisely  the  type  of  value-added  opportunity  that 
nimble  storage  vendors  can  exploit  for  market  and,  more  importantly,  customer  advantage. 

As  opposed  to  NetSCSI,  where  all  drive  commands  come  implicitly  authorized  from  the  file 
manager,  NASD  drives  must  authenticate  clients  and  enforce  the  access  control  decisions  of 
their  file  manager  on  client  requests.  NASD  drives  need  encryption  to  provide  client  authen¬ 
tication,  access  control  enforcement,  and  data  privacy  and  integrity.  If  they  are  further 
equipped  with  secure  coprocessors,  NASD  drives  can  also  provide  secure  key  management. 

As  an  example  of  a  possible  NASD  access  sequence,  consider  a  file  read  (Figure  4).  Before 
a  client  issues  its  first  read  against  a  file,  the  client  authenticates  itself  with  the  file  manager 
and  requests  access.  If  the  access  is  granted  by  the  file  manager,  the  client  receives  the  net¬ 
work  location(s)  of  the  NASD  drive(s)  containing  the  file’s  objects  and  time-limited  capabil¬ 
ities  to  present  to  these  drive(s).  If  the  client  is  new  to  a  drive,  it  will  also  obtain  a  time- 
limited  key  for  establishing  a  secure  communications  channel  to  the  drive.  When  granting 
an  object  capability  or  communications  key  to  a  client,  the  file  manager  also  informs  the 
corresponding  drive  using  an  (independent)  channel.  After  this  point  the  client  may  directly 
request  access  to  data  on  NASD  drives,  presenting  the  appropriate  capability  for  each  drive 
to  check  against  the  copy  provided  to  it  by  the  file  manager. 

In  addition  to  off-loading  file  read  operations  from  the  distributed  file  manager,  later  sec¬ 
tions  will  show  that  NASD  should  off-load  to  the  drive  file  writes  and  the  reads  of  file 
attributes  (just  another  region  of  a  drive’s  object).  Of  course,  high  level  file  system  functions 
such  as  access  control  lists  and  consistency  protocols  remain  the  purview  of  the  file  manager 
which  enforces  its  decisions  through  its  control  of  the  capabilities  available  in  each  NASD 
drive. 

Experimental  Methodology 

To  develop  an  understanding  of  the  performance  parameters  critical  to  network-attached 
storage,  we  performed  a  series  of  measurements  to:  (1)  characterize  the  behavior  and  cost  of 
AFS  and  NFS  distributed  file  server  functionality;  and,  (2)  provide  data  for  analytic  models 
of  the  SAD,  NetSCSI  and  NASD  storage  architectures. 

We  began  by  analyzing  AFS  and  NFS  file  system  traces  to  determine  the  types  and  fre¬ 
quency  of  operations  performed  by  distributed  file  systems  (Table  1  and  Table  2).  The  NFS 
traces  [Dahlin94]  record  the  activity  of  an  Auspex  file  server  supporting  237  clients  over  a 
seven  day  period  at  the  University  of  California  at  Berkeley.  These  were  collected  using  a 
packet  filter  program,  rpcspy.  The  AFS  traces  record  the  activity  of  our  laboratory’s  Sparcs- 
tation  20  AFS  file  server  supporting  25  client  workstations  over  a  four  day  period.  The  AFS 
traces  were  collected  using  a  modified  version  of  the  AFS  logging  facility,  ViceLog.  Both 
the  NFS  and  AFS  traces  document  every  client  file  system  request  processed  by  the  file 
server,  recording  for  each  request  event  an  arrival  timestamp,  a  unique  client  host  id,  and  the 
type  of  primitive  file  system  request  (e.g.,  read,  write,  get/set  attributes,  open,  close). 
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These  traces  capture  the  types  and  relative  frequency  of  client  requests,  but  they  do  not 
include  the  amount  of  work  done  by  the  file  server  for  each  request.  To  estimate  this  cost 
information,  we  measured  NFS  and  AFS  code  paths  on  a  current  high  performance  worksta¬ 
tion.  Specifically,  we  used  Digital  Equipment’s  ATOM  binary  annotation  tool 
[Srivastava94]  to  identify  the  code  paths  traversed  by  each  type  of  primitive  file  system 
operation  on  an  Alpha  workstation,  and  the  Alpha’s  on-chip  cycle  counters  to  measure  the 
amount  of  work  (in  CPU  cycles)  required  for  each  type  of  primitive  operation. 

Cost  measurements  were  taken  in  two  steps.  For  NFS,  the  entry  and  exit  points  of  each  pro¬ 
cedure  in  the  Digital  Unix  3.2c  kernel  were  annotated  with  ATOM  to  produce  a  dynamic 
call  graph.  The  file  server  machine  (a  DEC  3000/400  with  a  133  MHz  Alpha  21064  proces¬ 
sor  and  64  MB  of  RAM,  running  Digital’s  NES  versthereion  3  server)  ran  the  annotated  ker¬ 
nel  while  NES  client  requests  were  made  to  the  file  server,  producing  a  dynamic  call  graph 
for  each  type  of  primitive  NES  operation.  We  repeated  the  process  for  AES,  ATOMizing  the 
AES  user-level  server  code  to  produce  AES  server  call  graphs  (using  a  DEC  3000/500  with 
a  150  MHz  Alpha  21064  processor  and  128  MB  of  RAM,  running  Transarc’s  AES  version 
3.4  server). 

After  identifying  the  specific  AES  or  NES  routines  invoked  for  each  type  of  primitive  file 
system  operation,  the  kernel  (and  AES  server  code)  was  re-annotate,  limiting  annotation  to 
the  critical  components  of  each  primitive  operation’s  code  path.  This  significantly  reduced 
ATOM  overhead,  minimizing  measurement  distortion.  Primitive  file  system  requests  were 
applied  to  the  selectively  annotated  kernel  (and  AES  server)  100  times,  generating  traces 
that  recorded  the  code -path  execution  time.  The  Alpha’s  on-chip  counters  provided  single¬ 
cycle  accuracy  for  these  measurements.  Eor  NES,  we  calculated  and  removed  the  ATOM 
tracing  overhead  (although,  for  AES,  the  variability  of  operation  times  was  too  large  for  cal¬ 
culation  and  elimination  of  ATOM  overhead).  We  repeated  this  process  for  each  operation 
type  to  generate  the  average  cost  for  each  primitive  file  system  request,  parameterized  by 
request  size  where  appropriate. 

Experimental  Results 

Relative  importance  of  NFS  and  AFS  server  operations 

Table  1  and  Table  2  report  the  frequency  distribution  of  various  server  operations  for  the 
NES  and  AES  traces,  respectively.  The  top  of  each  of  these  tables  lists  the  operation  names, 
describes  their  functions,  then  reports  their  frequencies  and  total  number  of  occurrences  in 
the  corresponding  trace.  This  data  shows  that  directory  read  requests  (DirReads)  are  the 
most  frequently  executed  NES  operations  (43%)  while  attribute  read  requests  (EetchStatus) 
are  the  most  frequently  invoked  AES  operation  (49%).  The  results  also  show  that  NES  data- 
moving  operations,  BlockRead  and  BlockWrite,  account  for  only  16.9%  of  all  requests. 
Similarly,  AES’s  data-moving  operations,  EetchData  and  StoreData,  account  for  23.5%  of 
all  AES  requests. 

While  frequency  numbers  do  not  emphasize  data-moving  operations,  the  cycle  count  data 
shown  in  the  bottom  of  Table  1  and  Table  2  indicate  that  data  movement  places  a  significant 
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NFS  Operation 

Description 

Percent 

Quantity  (x10®) 

AttrRead 

Get  metadata 

36.5 

11.1 

AttrWrite 

Update  metadata 

2.7 

0.83 

BlockRead 

Fetch  data  from  server 

14.0 

4.25 

BlockWrite 

Send  data  to  server 

2.9 

0.88 

DirRead 

Read  directory  entries,  convert  fiiename  to 
fiiehandie,  etc. 

43.1 

13.1 

DirReadWrite 

Creation  of  fiies/directories,  fiie  renaming,  iinks,  etc. 

0.7 

0.21 

DeleteWrite 

Deietion  of  fiies/directories 

0.1 

0.04 

Operation 

#  of  Cycles 
(xIO^) 

Get  Attribute 

33 

Set  Attributes 

63 

Directory  Read  (1  entry) 

63 

Directory  Read  (40  entries) 

105 

Directory  Lookup 

50 

Access 

37 

Remove 

135 

Size  of  Operation 

Read 

(x  10^  cycles) 

Write 

(x  1 0^  cycles) 

1  byte 

54.6 

117.5 

1 K  Bytes 

61.1 

125.1 

2K  Bytes 

68.2 

134.5 

4K  Bytes 

78.0 

147.8 

8,000  Bytes 

100.9 

199.3 

Table  1:  Distribution  and  cycle  costs  for  each  type  of  NFS  operation.  All  measurements  were  taken  on  a  DEC  3000/400 
(133  MFlz)  NFS  Server  running  an  ATOMized  Digital  Unix  3.2c  kernel.  The  server’s  caches  were  warmed  and  results 
from  trials  that  produced  misses  in  the  local  cache  were  discarded. 

burden  on  the  server.  An  NFS  8,000  byte  read  requires  over  lOOK  cycles  and  an  NFS  write 
requires  almost  200K  cycles.  AFS  incurs  more  cost  for  reads  and  writes:  an  8K  fetch  is 
330K  cycles  and  a  store  is  410K  cycles.  There  are  other  expensive  operations:  an  NFS  direc¬ 
tory  read  (40  entries)  requires  105K  cycles,  and  an  NFS  remove  (8K  file)  consumes  135K 
cycles,  while  an  AFS  BulkStatus  (30  entries)  requires  1,313K  cycles. 

To  estimate  the  relative  importance  of  various  primitive  operations  in  the  total  workload 
applied  to  a  file  server,  we  estimated  the  total  amount  of  work  done  per  request  type  by  a 
server  during  the  execution  of  each  trace.  Specifically,  multiplying  the  per  operation  type 
count  of  occurrences  by  the  measured  average  per  operation  type  cycle  counts,  we  estimated 
the  total  server  workload  per  operation  type.  Representing  this  per  operation  type  total 
workload  as  a  percentage  of  the  total  over  all  operation  types  gives  our  estimate  of  the  rela¬ 
tive  importance  of  primitive  operations. 

As  shown  in  the  server-attached  disk  (SAD)  columns  of  Table  3,  the  data-moving  operations 
contribute  27%  of  the  total  NFS  server  workload  and  59%  of  the  total  AFS  server  workload. 
This  suggests  that  the  performance  gained  by  directly  moving  data  between  client  and  disk 
may  be  limited  by  other  file  server  functionality  [Drapeau94].  As  the  next  subsection  shows, 
this  observation  limits  the  benefit  of  NetSCSI  for  off-loading  file  manager  workload  and 
motivates  the  design  of  a  NASD  drive  interface. 

Comparing  SAD,  NetSCSI,  and  NASD  server  performance 

Based  on  the  analytic  model  of  server  workload  in  SAD  systems,  described  above,  and  the 
outline  of  NetSCSI  and  NASD  drives  in  the  last  section,  we  project  the  total  file  manager 
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AFS  Operation 

Description 

Percent 

Quantity  (xIO^) 

FetchStatus 

Query  metadata  information  on  a  directory  or  tile  (creation 
date,  last  modified  time,  permissions,  etc.) 

49.0 

45.7 

Fetch  Data 

Send  data  from  the  AFS  server  to  the  requesting  client. 

18.3 

17.1 

BulkStatus 

Preform  a  group  of  FetchStatus  operations  and  package  all  the 
results  in  a  single  reply 

10.6 

9.9 

StoreStatus 

Update  metadata  information  (last  modified  date,  file 
permissions,  etc.) 

7.9 

7.4 

Store  Data 

Store  data  sent  by  a  client  into  a  file  on  the  AFS  server. 

5.3 

4.9 

CreateFile 

Create  a  new  file  in  the  AFS  server  namespace. 

2.1 

2.0 

Rename 

Move  a  file  from  one  location  in  the  AFS  server  namepsace  to 
another  location. 

1.9 

1.7 

RemoveFile 

Delete  a  file  stored  on  the  AFS  server. 

1.5 

1.4 

Others 

Operations  that  occurred  with  a  very  low  frequency  (ACL 
manipulation,  symbolic  links,  directory  creation/deletion,  lock 
management,  volume  management,  etc.) 

3.4 

3.2 

Operation 

0 

1 

512 

Cycles  ai 

1K 

ccording 

2K 

to  Size  of 

4K 

[  Operatic 

8K 

)n  (thousi 

16K 

jnds) 

32K 

64K 

1M 

Store  Data 

259 

— 

291 

303 

363 

371 

410 

578 

750 

1,242 

16,752 

Fetch  Data 

179 

192 

191 

204 

270 

330 

439 

788 

1,544 

— 

RemoveFile 

— 

331 

396 

396 

410 

411 

412 

414 

429 

452 

1,053 

BulkStatus  Size 

Cycles 
(x  10^  cycles) 

1 

151 

2 

154 

3 

178 

10 

324 

20 

578 

25 

1,313 

Operation 

Cycles 
(x  10^  cycles) 

FetchStatus 

128 

StoreStatus 

189 

CreateFile 

307 

Rename 

285 

Others 

227 

Table  2:  Distribution  and  cycle  costs  for  each  type  of  APS  operation.  Cycle  counts  were  taken  on  a  DEC  3000/500 
(150MHz)  running  an  ATOMized  AES  version  3.4  server  and  averaged  over  100  trials.  The  server's  caches  were 
warmed  and  results  from  trials  that  produced  misses  in  the  local  file  system  cache  were  discarded.  Number  of  cycles 
for  “Others”  (which  mainly  consists  of  operations  for  manipulating  callbacks,  links,  access  control  lists,  and 
directories)  was  estimated  as  the  average  of  the  four  size-independent  operations  that  were  measured  individually 
(namely,  FetchStatus,  StoreStatus,  CreateFile  and  Rename).  Because  the  variation  induced  by  different  levels  of 
instrumentation  was  insignificant  compared  to  the  variation  between  different  trials  at  the  same  level  of 
instrumentation,  we  did  not  estimate  the  instrumentation  cost. 


workload  in  distributed  file  systems  using  network-attached  storage.  The  results  of  this  pro¬ 
jection  are  shown  in  the  NetSCSI  and  NASD  columns  of  Table  3. 


In  the  NetSCSI  model,  the  read/write  data  path  avoids  the  file  server  on  data  transfers.  How¬ 
ever,  NetSCSI  still  requires  the  server  to  perform  processing  on  every  read  and  write 
request,  specifically  to  authorize  access  and  determine  the  block’s  location  on  disk.  We 
modeled  the  manager  workload  while  employing  NetSCSI  drives  by  eliminating  the  data 
movement  portion  of  client  reads  and  writes;  the  work  of  each  write  was  estimated  by  the 
SAD  work  done  by  a  zero  byte  store  and  the  work  of  each  read  was  estimated  by  the  SAD 
work  done  by  a  one  byte  read. 
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NFS 

Operation 

SAD 

NetSCSI 

NASD 

Cycles 

(a109) 

% 

Cycles 

(aIqS) 

%* 

Cycles 

(aio9) 

%* 

Attr  Read 

370.6 

20% 

370.6 

20% 

0.0 

0% 

Attr  Write 

52.3 

3% 

52.3 

3% 

52.3 

3% 

Block  Read 

367.7 

20% 

231.9 

12% 

0.0 

0% 

Block  Write 

130.5 

7% 

103.7 

6% 

56.0 

3% 

Dir  Read  Lookup 

254.6 

14% 

254.6 

14% 

0 

0% 

Dir  Read  non  lookup 

675.4 

36% 

675.4 

36% 

0 

0% 

Dir  RW 

13.6 

0% 

13.6 

0% 

13.6 

0% 

Delete  Write 

5.1 

0% 

5.1 

0% 

2.4 

0% 

Open 

0.0 

0% 

0.0 

0% 

41.1 

2% 

Total 

1,869.8 

100% 

1,707.2 

91% 

165.4 

9% 

AFS 

Operation 

SAD 

NetSCSI 

NASD 

Cycles 

(aio9) 

% 

Cycles 

(aio9) 

%* 

Cycles 

(aiqS) 

%* 

FetchStatus 

5.8 

21% 

5.8 

21% 

0.0 

0% 

FetchData 

10.0 

36% 

3.1 

11% 

0.0 

0% 

BulkStatus 

1.7 

6% 

1.7 

6% 

0.0 

0% 

StoreStatus 

1.4 

5% 

1.4 

5% 

1.4 

5% 

Store  Data 

6.5 

23% 

1.3 

5% 

0.9 

3% 

CreateFile 

0.6 

2% 

0.6 

2% 

0.6 

2% 

Rename 

0.5 

2% 

0.5 

2% 

0.5 

2% 

RemoveFile 

0.6 

2% 

0.6 

2% 

0.3 

1% 

Others 

0.7 

3% 

0.7 

3% 

0.7 

3% 

Open 

0.0 

0% 

0.0 

0% 

3.3 

12% 

Total 

27.8 

100% 

15.7 

56% 

7.7 

28% 

Table  3:  Projected  workload  of  the  NFS  (top)  and  AFS  (bottom)  distributed  file  manager.  This  table  reports  the  estimates 
of  a  simple  analytic  model  to  compare  the  relative  scalability  of  file  managers  in  SAD,  NetSCSI,  and  NASD 
environments.  Because  NFS  and  AFS  traces  are  of  different  servers  and  lengths,  comparison  of  the  two  systems  is  done 
using  the  of  cycles  for  each  operation,  not  the  total  number  of  cycles  for  each  operation. 

*  in  the  NetSCSI  and  NASD  columns  represent  the  percentage  difference  between  each  particular  NetSCSI  or 
NASD’s  operation  cycle  count  and  the  SAD’s  total  cycle  count. 


For  AFS,  the  file  manager  in  a  NetSCSI  system  executes  only  about  half  as  many  cycles  as 
in  the  SAD  system.  For  NFS,  the  improvement  was  much  smaller  because  of  NFS’s  high 
frequency  of  directory  and  attribute  read  operations,  which  are  not  significantly  off-loaded 
in  a  NetSCSI  system,  and  because  NFS  data  transfers,  typically  SKbytes  per  request,  are 
smaller  than  AFS  data  transfers,  which  can  be  as  large  as  64Kbytes  per  request. 

In  our  model  of  a  NASD-based  distributed  file  system  all  read  operations,  including 
attribute  and  directory  reads,  are  sent  directly  to  the  NASD  drive  by  clients.  By  relying  on 
clients  to  find  attribute  and  directory  data  in  the  NASD  object  namespace  (by  convention  or 
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as  a  result  of  an  open  request  to  the  file  manager),  NASD  drives  do  not  distinguish  data, 
attribute  and  directory  operations.  For  attribute  and  directory  writes,  however,  we  pessimis¬ 
tically  assume  that  clients  must  send  these  requests  to  their  file  managers.  To  estimate  the 
manager’s  pre- authorization  and  capability  setup  work,  we  introduced  an  open  request  (syn¬ 
thesized  in  our  traces  to  occur  whenever  a  file  was  touched  after  at  least  30  seconds  of  inac¬ 
tivity)  that  requires  manager  work  comparable  to  an  attribute  write  operation.  Data  write 
operations,  whose  data  is  sent  directly  to  the  NASD  drive  by  clients,  and  file  remove  opera¬ 
tions,  whose  object  deallocation  work  is  done  by  the  NASD  drive,  are  also  estimated  to 
require  manager  work  comparable  to  an  attribute  write  operation.  Finally,  we  assume  that 
NFS  clients  in  NASD  systems  replace  directory  lookup  operations  with  NASD  (directory) 
object  reads  and  execute  the  lookup  locally. 

For  AFS,  Table  3  shows  that  NASD  systems  may  reduce  file  manager  workload  by  a  factor 
of  2  over  NetSCSI  systems,  or  a  factor  of  4  when  compared  to  SAD  systems.  For  NFS, 
where  directory  and  attribute  reads  dominate  manager  workload,  file  managers  using  NASD 
drives  may  benefit  from  a  factor  of  10  decrease  in  load. 

Conclusion  and  Future  Directions 

Network-attached  storage,  enabling  direct  transfers  between  client  and  storage,  can  substan¬ 
tially  increase  distributed  file  system  scalability  while  simultaneously  enabling  striped  stor¬ 
age  to  satisfy  the  bursty,  high-bandwidth  demands  of  the  increasingly  high-performance 
clients  populating  local  area  networks. 

In  this  paper  we  presented  a  simple  classification  of  storage  architectures  for  distributed  file 
systems.  This  classification  contains  four  models.  The  traditional,  server-attached  disk 
(SAD)  model  is  our  base  case.  Server-integrated  disk  systems  include  the  familiar  NFS 
server  products,  which  are  architecturally  identical,  but  with  hardware  and  software 
designed  specifically  for  executing  file  service.  We  do  not  emphasize  this  model  because  it 
binds  storage  products  to  a  particular  choice  of  distributed  file  system. 

The  remaining  two  storage  models  exploit  network- attached  storage.  Network  SCSI 
(NetSCSI)  drives  are  very  similar  to  current  SCSI  disks  in  that  all  file  requests  go  through 
the  distributed  file  manager,  but  the  resulting  data  transfers  go  directly  between  client  and 
the  drive.  For  AFS  workloads,  this  may  reduce  file  manager  workload  by  about  50%.  Differ¬ 
ent  security  models  can  be  provided  using  NetSCSI  depending  on  the  cryptographic  support 
provided  in  the  drive.  Network-attached  secure  disks  (NASD)  support  storage  semantics 
between  that  of  block-level  protocols  like  SCSI  and  distributed  file  system  semantics  like 
NFS  or  AFS.  The  partitioning  of  file  system  functionality  between  NASD  drive  and  file 
manager  is  optimized  to  reduce  file  manager  load  while  maintaining  system  flexibility.  To 
operate  securely  in  the  face  of  this  partition,  NASD  drives  rely  on  at  encryption  and  key 
management  support.  By  off-loading  data  read  and  write  and  attribute  and  directory  read 
operations,  distributed  file  system  server  load  may  be  reduced  by  a  factor  of  4  for  NFS  to  10 
for  AFS  with  NASD  drives. 

Our  analysis  is  focused  on  describing  the  distinct  methods  of  organizing  storage  architecture 
and  estimating  the  potential  improvement  each  promises  for  distributed  file  systems.  With 
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the  promising  results  given  here,  our  future  directions  are  clear.  We  plan  to  demonstrate  that 
distributed  file  systems  can  be  implemented  around  network-attached  storage,  preserving 
powerful  security  models,  and  yielding  considerable  scalability  and  client  performance 
advantages.  Along  this  path,  many  open  questions  remain.  Our  NASD  model,  in  particular, 
expects  a  disk  drive  to  be  capable  of  computation  not  normally  associated  with  cost-sensi¬ 
tive  commodity  peripherals.  Drive  micro- architectures  and  software  structures  must  be 
developed  and  demonstrated.  Server  caching  in  traditional  systems  is  a  side-effect  of  data 
store-and-forward  through  the  server.  With  network-attached  storage,  we  lose  this  benefit, 
and  we  must  evaluate  new  caching  strategies,  including  distributing  the  caches  among  stor¬ 
age  or  providing  separate  cache  servers.  In  the  NASD  models  we  have  presented,  we  still 
assume  that  clients  “open”  files  by  contacting  the  distributed  file  system  server  to  set  up  the 
state  needed  for  direct  transfers  to  and  from  storage  and  allow  the  file  manager  to  handle 
consistency.  A  clear  improvement,  similar  to  the  effect  of  client  caching  in  AFS,  might  be 
provided  by  pre-authorization  or  group- authorization  schemes. 
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